- Title
- Sequence count data are poorly fit by the negative binomial distribution
- Creator
- Hawinkel, Stijn; Rayner, J. C. W.; Bijnens, Luc; Thas, Olivier
- Relation
- PLoS One Vol. 15, Issue 4, no. e0224909
- Publisher Link
- http://dx.doi.org/10.1371/journal.pone.0224909
- Publisher
- Public Library of Science
- Resource Type
- journal article
- Date
- 2020
- Description
- Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods.
- Subject
- binomial distribution; microbiota; RNA; poisson distribution; regression analysis
- Identifier
- http://hdl.handle.net/1959.13/1437981
- Identifier
- uon:40526
- Identifier
- ISSN:1932-6203
- Language
- eng
- Reviewed
- Hits: 1414
- Visitors: 1233
- Downloads: 0
Thumbnail | File | Description | Size | Format |
---|